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ABSTRACT 

Computerized spelling programs or "spelling checkers" 
caa be a wonderful tool for writers at any level of competence. 
However, they should not be used as adjuncts to the teaching of 
writing unless they meet two boundary conditions, one of size and one 
of design. The problem with design of the programs is that thBy allow, 
for the correction of typographical errors and misspellings without 
human intervention, thus reducing the student to a passive 
key-pusher. The problem with size is that most of the spelling 
dictionaries have a limited working vocabulary* Other writing . 
prog- dms known as "grammar-" or " style-checkers" call attention to 
incorrect usages, redundancies , wordiness , meaningless intepsif iers , 
gender-specif ic terms , split cpmpounds, cliches, and other solecisms 
common in bad writing. Unfortunately, these programs are 
unregenerately prescriptive, offering substitutions for nearly every 
phrase they store. Better suited to the needs of a writer would be a 
software package that analyzes text. However, the High nixmbsr of 
computations such a program would require renders suq}) ^an idea 
impractical. Writing software packages, if properly d.esigned and 
applied, can provide extensive text analysis. Unfortunately, uch of 
the software originates with commercial programmers rather thia with 
experienced classroom teachers. As such, they may, in fact, produce 
worse rather than bettei^ writers. (HOD) 
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SOME BOUNDARY CONSIDERATIONS FOR WRITING- SOFTWARE 



John Thiesraeyer 
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In this paper I will briefly discuss a number of present and future 
limitations on the design of software to be used in conjunction with word 
processors to help teach writing. The limitations, or boundaries j I ha/e in. 
mind are of several kinds: linguistic, pedagogical, matheraatical , 
occasionally even pragmatic. Let me make clear that my remarks are 
intended primarily for a teaching-learning application above the elementary 
level; in reference to a business, home or lower-school " environment they 
would need occasional qualification. 

The decline of the text in our culture produces more and more 'people 
who hf^ve learned vocabulary through the ear not the eye, and whose phonetic 
understanding is inadequate to a traditional English orthography. Barring 
radical reform of the rules, writing teachers face an apparently Sisyphean 
labor. Repeated marking of spelling errors on student papers is 
demoralizing, when we know the student is unlikely to profit by our labors. 
As suggested in a recent N> Y> Times Magazine piece (February 26, J984) on- 
the writing of very young children, moreover, our red ink may do actual 
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damage to*^ the writer. "Snakes are dispikibel," wrote a first-grader, and* 

his principal pointed out that if the misspelling were imraediately/ju^pedc 

/ ,^ 

on, the youth might choose "bad" the next time to be safe. Surely th(» point 



is good at any level of teaching: the more fr^antically we^decry and punish 
poor spelling th^ more we promote a monosyllabic and unadventurous prose. 
Yet spelling ability^, dec lines each year among college freshmen. 
(■ Hejip is available in the form of "spelling checkers," a wonderful 

utility for writers at any level of competence. Except for homonyms and 
split compounds, they can eradicate typographical errors and misspellings 
in a document. Because they projtect rather than punish, the writer cari use 
an entire working vocabulary rather than ' choos ing a safe but^^smaller 
orthographic one. The teacher need- never again confront a set of papers 
averaging three or more misspellings per page; my own experience shows that 
three or four spelling errors per set is an achievable goal. 

Despite these promises, howexyer, I would argue at spelling checkers 
should not be ,used as adjuncts to' the teaching of writing unless they meet 
two boundary ' conditions , one of size and one of design. '^-^ 

Let me take up the design problem first. The writer's of 
spell '.ng-checker programs have figured out that in addition to tagging 
misspelled words their programs can easily be made to offer "corrections" 
of one kind or another. If hte is a common typographical ofror for the ^ for 
example, the checker can be programmed either to make the correct 
substitution silently or to offer it as an option to the user,^who need 
only touch a key to have the replacement raade. The logical extension Ls 
obviously to incorporate a subsidiary dictionary of common typos an^ 
misspellings .-(a dictionary that could be indefinitely large , in* principle) 
and make or offer corrections for all'of them. Another strategy is to 



store phonetic approximations along with correct English .letterings "as ah 
aid to the increasing numbers of aural spellers. A third approach is f:o 
search the main dictionary list for words spelled like the word in 
question, to some degree of approximation, and offer a menu of . • 
possibilities for the user to choose ^among; thus if seperate turns up in a 
text the menu will probably include the correct separate among its 

... ^ ^ ■ ' . .■ ■ ■ ' 

suggefi tious . All three types of assistance are presently used in 
commercial spelling programs , and sophisticated algorithms are being 
developed 'that will eventually allow perhaps 90% or more of the 
typographical errors' and misspellings in 'a^ document to be corrected 
automatically, without human intervention (see', e. g.. Communications of 
the Association for Computing Machinery [ Comm ACM hereafter], Apr. 1984). 

The problem with strategies like these is that they, are thoroughly, 
anheuristic. Instead of promoting learning they deny itiitiative and. 
forestall thought: the student (or any user) is reduced to a passive 
key^pusher, or even an . uninvolved bystander. As a teacher , I would place ^ 
ail checker programs promising "correction" beyond the pale,, unless their 
correcting featuries can be disabled ^or pedagogical use. If students are 
forced to look up each rejected word there is hope that over time the 
conventional orthographies may be lea.rned; with optional or automatic 
"correction," learning is precluded ^and the speMing problem in our culture 
can only worsen. ^ ' 

The question of appropriate size for a spelling dictionary is 
interesting because there id^ contusion over just how many words we use. -If 
we accept the common estimate of around 15 thousand .words as the typical 
working vocabulary of a high-school senior (see E. L. Thorndike and I. 
Lorge, The^ Teacher's Word Book of 30,000 Words, 1944), it would seem that a 



dictionary of 25,000 words 'or so is ample to catch mosL misspellings • A 
computer professionals* journal suggested three years ago that **a 
dictionary:- of 10,000 woa^ds would be quite reasonable for a small community 
of users" ( Comm ACM , Dec. 1980). Programmers' pr^tice' generally concurs. 
To instance only a few of the fifty or more spey.ing\iieckers currently 
available. Commodore advertises Totl-Speller with 10', OOtKjtfbpds ; Aspen r-. 
Software (now Wang Electronic Publications) .has 38,000 in its Proofreader ; ■ 
Oasis Systems offers- 45^,000 with Th'e Word Plus . Among the large computer 

systems I know of, the spelling dictionaries on VAX/VMS minicomputers and 

. , ■ ji ^ - ^ 

on mainframes* using 4^he Unix operating, system have typically contained 
20,000 words. 

A more recent estimate is that the average high-school senior knows 

about 70PP root words (H. F. Dupuy, The Rational^, Development and 

Standardization of a Basic Word Vocabulary Test , 1974);" the multiplier 

t 

effect of prefixes, suffixes and compounding would suggest that the average 
working vocabulary is consider-ably greater than 15,000. Fifty years ago, 
indeed, Leonard Bloomfield maintained that even uneducated adult speakers 
use "somewhere round 20,000 ^to 30,000" words.. (Language, 1933). 

I cannot refine these estimates for the individual, student, but the 
discvery I (and probably others) have made is .that the working!'vocabulary 
of a group of 20-odd quite ordinary college students is much closer to 

^100,000 than to 15,000 words. It follows that all the popular checkers 
mentioned, along* with most others now on tK^ ma^rket , are grossly inadequate 
for. teaching purposes.^ This somewhat surprising assertion is based on my 
experience "with two sections of 20 students each^ using a spelling checker 
(Radio Shack's Scripsit Dictionar y) with a barely adequate- 75,t)00 words. 

. That is the largest spelling dictionary I know of. for "8-bit" micros , yet. 



the 400 papers my students wrote averaged about one common vocabulary word 
apiece that the spelling dictionary did not recognize • The pedagogical 
problem. is that if, as a result of using an inadequate spelling program, 
students are forced to look up a fair number of correctly spelled words,- 
they will soon become cynical about the process and prone to guess ;^that 
some of their incorrect spellings are correct and need not be looked up. 
When that happens,' the numbers' of misspelled words -will slowly creep upward - 
again in their papers . . 

I should add a ^inal warning to those tempted to buy a spelling 
checker for personal or classroom use. The often-advertised feature, that 
allows a user to "add as many words, as desir^ed" tb a spelling checker is a 
'snare. The, time involved in adding even two or three thousand corr^ectly 
spelled ybrds to a checker dictionary, after' first suspecting and^/hen 
checking to confirm they are' missing, is -prohibitive ly . long,,, and since as I 
have indicated n^st checkers are too small by half or more, making them 
adequate is a task not even to be contemplated by a busy person. 

As writing teachers concerned with tha two kinds of limitations I have 
'described for . spelling programs— a lower bound on their adequacy at around 
80 or 100 thousand words, and a heuristic need to prevent mindless 
correction features — the scope of the assault we face- at programmers ' hands 
is best. seen, perhaps, in contemplation ' of the * bes t-known checkers. 
' Broderbund * s Bank Street Speller -^ devised to accompany the widely praised 
worfl-proces'sing program for children, offers corrections; so does 
Cornucopia's Electric Webster , proudly advertised in' the words of a 
reviewer as "the Cadillac of vocabulary programs"; the legendary Writer *;s 
Workbenc h, which Bell Labs is said to be p'Lanning at la^t to issue in a 
version suitable for microcomputers after lengthy mainframe .development and 



use, includes a spelling checker with only words; and whoever 

purchases'* MicroPro"^ s' SpellStar to .accompany their popular Wordstar writing 
program, wilT get a mere 20,000 words for a list price of $250. 

The next level of software that helps with writing is constituted by 
the so-called ''granmiar-*' or "style-checkers,'* which are neither. Like 
spelling checkers they depend on stored dictionaries, in this case 
primarily of phrases rather than individual words. Whereas spelling 
dictionaries hold lists of correctly spelled words and call attention only 
to spellings "that don't find a match, the phrase checkers contain incorrect 
ugage$^( dif f erent then ) , redundancies ( time period ), wordiness ( due to the ^ 
fa ct that ) , meaningless inten^ifiers ( incredible ), gender-specific terras 
( mankind ) , split "corapound^ ( some what ) , cliches ( pure and siigple ) , and 
S'.other solecisms common in bad writing > Many words and phrases in these 
dictionaries .raise" questions of taste or judgment rather than of- outright 
error. When a match is found between text and phrase dictionary the 
potential mistake is reported to the user for reconsideration. 

Usage programs can also check for some mechanical errors,- like the 
placing -of a period, or comma outside close-quotation marks, and can keep 
tallies of^ words or phrases singled out for special attentioi^ such' as 
copulative verbs or referential pronouns. Some will print a concordance of 
the writer text , useful for finding examples of excessive repetition. 

Misused words and phrases constitute bad grammar only sometimes , and 
their revision is not guaranteed to elevate style. ^ The usage checkers do. 
have a salutary effect on student writing, however: beyond their efficacy 
in removing many blighted . locutions bef<^e the teacher has to respond to 'a 



paper, they help considerably in ^getting the point across that extensive 
analysis, revision and rewriting should always precede submission. 

\ I don't propose to dwell at. length on usage checkers here. My* wife 
El&iue (an English' teacher and computet programmer) and I have been for 
many months intensively involved in the development and use of such a 
system for teaching, and I am reporting on it elsewhere at this conference. 
.Commercial examples include Electric Webster's "Grammar Option," Oasis 
System'^s Punctuation + Style (with an excellent checker for mechanical . 
errors), and Wang Electronit: Publications ' Gramma tik , all of which I havej 
studied, and one of the components of Writer'*s Workbench . All include 
phrase dictionaries of between 600 and 750 items; the one we have been 
using is about three times as large. (I estimate that the dictionary size 
needs to be at least doubled again-, to 4000-5000 entrie^s , before the 
usage-checker programs can be reasonably sure of catching most common 
errors . ) 

The . boundary^consideration that I find applies in the evaluation of 
such software for classroomi and laboratory use is similar to one I raised 
about spelling programs: the u5age checkers are unxBgenerate ly^ 
prescriptive, offering substitutions for nearly every phrase they stoje. 

This is bad enough in its denial of the user * s creative faculties , its 

.1 >, ■ > 

implication that for each misuse there are only one or sometimes two 

appropriate corrections, its' cultivation of a kind of bland, ' 

lowes t -common-denominator prose . ^ 

The problem is compounded when, ^ as in the examples I have seen, the 

phrase ' dictionaries were evidently put together in haste, and not by 

experienced teachers or good writers.. Thus . in Gramma tik the word busboy' i 

identified jas a "gender specific [sic] term ,"' which it is, accompanied by 

'i 
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the admonition to "use ^server,*" which is incorrect. Or we find excesses 
like Electric Webs ter ' s mess age , each time the writer uses' the word put . 
that there is a similar word putt ( EW calls it a ''homonym,," which it is 
not); so' for every golfer who cannot spell the final stroke there are 
surely dozens, perhaps hundreds, of writers 'whose perfectly proper put ' s 
would call forth spec ious' messages . Again: in all checkers I have examined 
a particular kind of oversight occurs repeatedl/: a verb or verb phrase we 
might all agre^ needs revision is list'ed, and therefore .picked up by th,e 
program, only in its first-person-present or infinitive form. Thus utilize 
is flagged, and use is suggested^ to replace it, but the writer who^uses 

utilizes , utilized , utilizing , utilization or even utilizer is ignored. ^ 

I ' - ' - 

I 

Such oversights are the rule, not the exception, when, programmers ; usurp, the^ 
role of writing , teacher . . • 

Until software developers begin taking writers' needs more 
thoughtfully into account, then, the te^chei: wishing to experiment with a 
usage checker should make sure its phrases can be extensively edited and 
supplemented, its prescriptions suppressed. Unlike Electric Webster's 
'^Grammar Option" it should also, by the way, allow its messages to be 
printed but: a cathode-ray t^be with a few lines of text, errof message 
and a blinking cursor is, no- proper environment" for thoughtful revision of 
one's work. I may say that Gramma t Ik , which is the much-modified heart of 
our own system, is quite brilliantly designed to allow such options, though 

-, s - • 

there are sexious deficiencies in its analyses of English uss^^ge and 
mechanics . . 

Beyond the levels of word and phrase che'cking already incorporated in 
writing software is the level of what might be^ called J^phrase pajitern" at 
which, for example, a sentence including the' words not only would be' 



checked to make sure, ii; had a following but or but also > Conversely, a 
sentence containing one, everyone , or person and a subsequent they , tjvem, 
their would be gently queried about agreement. So far as I know, text 
analysis an this level has scarcely begun aitiong academic 'prograpuhexs , much 
less commercial software developers; I have been accumulating instances of 
such patterns and we' hope to develop a program this summer to incorporate 
them. ' ' , 

Despite my criticisms and reservations I have high h6pes for writing- 
software. Properly designed. .and applied, -_it can. provide extensive text 
analysis to the student under circumstances whiclx encourage revision, and 
can do so without preempting the student^ s own tive or creativity. 

Although hard evidence of long-term benefits is yet to accumulate, 
classroom experi.nce so far is encouraging. The hope that some. of the - : 
pitfalls I have outlined can be avoided simply by leaving 'writ ing-software. 
development in academic hands is dashed, however, by the reflection that 
the probable future of word processing lies not with academic mainframes or. 
minis but with microcomputers,, and that much of the writing software 
students use will' therefore originat.'i with commercial programmers. I hope 
I have adequately communicated my sense that commiercially produced ^ 
computer-assisted instruction in writing may well produce worse rather than 
better writers. ' , \ / ^ 

• / 

As a final consideration for this paper^it is worth raising the, 
question just how sophisticated we might expectv computer text-analysis 

■ . - ■ \ . •. - / 

eventually to become, Will we see full-fledged syntax software unerringly / 



picking out sentence fragiuents, comma splices, dangling modifiers , 
improperly formed possessives, failures of agreement between subject and 
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verb, noun and pronoun? For several years tantalizing reports have come 
out of IBM research labs at Yorktown Heights, N,Y, , of a text processor 
called "Epistle** that 'has remarkable powers of syntactic analysis (see 
'"Studies in Text. Processing,'* IBM. JResearch Highlights , Oct, 1981, and 
Forbes, Aug, 15, 1983). A recent report on Japan's "Fifth-Generation 
Project" to become world leader in advanced computer technology indicates 
that by the early 1990's the Japanese hope t'o have a system with a 
"vocabulary of up t,o 10,000 wo^ds , 2000 grammar rules, and 99% accuracy in 
-syntactic analysis of "^ritte^ natural language, • • ."• The syst^e^^vill 
employ computers capable, ^^^erforming up -to a billion logical inferences 
per second, perhaps 30 , 000 'times 'fas tcr than today^s best machines ( Comm 
ACM , Sept. 1983). * - - - . 

' In the face of such prospects, who would dare suggest that 
natural-language analysis approximating th^t of a human expert will not 
soon be carried out* by machines? I would, for one. I cannot prove the ' 
case definitively, but with the hel^ pf a travelling-salesman story I owe 
to my colleague Richard Decker in Mathematics, I will offer a strong 
conjecture. ' " - ■• . 

.It^ha's been shown mathematically ..that a "context-free" language can be 
"computed," that is, it can in principle be generated and analyzed by a 
computer (see Stephen A. Cook*s Turing Award Lecture, "An Overview of 
-Computational Complexity.," Comm- ACM , June :1'983 ) No iuch proof exists for 
context-rbound languages. English is not context-free as a pair of 
examples will show^y"'How time flies,*'- sighed Susy's mother as the child, 
using her birthday s topwatclii , learned how to time flies." In the context 
of the one small word- _to, the words time and flies exchange their parts of 
speech.' Again: the words does sHe almost invariably indicate a question, 
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ynless the prior context includes a phrase starting with only , "Only after 
long deliberation does she invest in hog futures /V 

Though but two of a myriad ej^amples of the ways in which 'wor.-i 
arrangements alter meaning in English, these should suffice to indicate how/ 
Very many Special rules and exceptions, in addition to the Cf>mplicated 
taxonomies of our simple, compound, complex and compound-complex forms, 
would be needed for a full syntactic description of the language* 

The lexical ambiguities of our language mean, furthermore, that m 

addition to a very large body of rules, a competent sentence analyzer would 

need a dictionary that not only listed but labelled each word according to 

t 

.its allowable syntactic functions. A sentence analy^sis might then involve 
trying all permitted lexical functions of its words in order to match its 
structure againj^t the stored rules. All examples I have seen of 
natural-language parsers use labelled' lexicons of this kind '(a good example 
of what is being attempted is Jane Robinson's "Diagram: A Grammar for 
Dialogues," C omm ACM . Jan. 1982). As *we shall see in a moment, the 
discovery of an apparc^nt lower limit on the size of ef f ective spel ling 
checkers has considerable bearing on the practical possibilities for 
natural-language analysis^" 

i ^ . ' • ■ 

For. a dramatic illustration of what happens when a computer must Cry 

r 

many arrangements or combinations of elements to find a desired solution, I 
offer the travelling-salesman problem. Consider, the following "real-world" 
situation. A company specializing in the manufacture of very large computer 
systems hay sales offices in 40 American cities. The sales manager wishes 
to visit each office on an annual inspection tour.. Fuel prices are high, 
and time is money. Can the company computers calculate the shortest route 
among the forty cities? • ^/ 
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Intuition suggests that a forty-city route should be a simple matter 
for a modern mainframe number-cruncher. Mathematics replies that the 
problem canno't be solved in practice: the number of alternate routes to be 
compared is gre^.tc^r than 4 X 10^^, aijfd the fastest conceivable computer 
could not complete the calculation during the estimated life of the 
universe . . * 

As the travelling-salesman problem illustrates, when the number- of 
items goes up the number of possible combinations of items increases 
enormously. I know of no one who has worked out the mathematics for word^ 
combinations and sentence types, but reflection suggests that the 
computational requirements will be high. If th§ working vocabulary of a 
group of orliinary teenagers is somewhere around 100,000 words, we can see 
that the 10,000-word lexicon anticipated by the Japanese for their" 

natural-language analyzer may be an order of magnitud too small, even for 

< 

a limited speech community. If the number of words to be checked. and the 
number of phrase and sentence rules are high epough, I conjecture that Cihe 
sheer magnitude of the .computational task will render the dream of a 
program that identifies all, or even most, incorrect sentences impr^tical 
for a long while to come. * 




